Part 2: Spatial & Statistical Analysis

Understanding Distribution Patterns and Grade Characteristics

EDA

Spatial Analysis

Statistics

Visualization

Penulis

Ghozian Islam Karami

Diterbitkan

3 Oktober 2025

Introduction

With validated data from Part 1, we now explore where the data is located and what it tells us statistically. This post covers:

Pillar 2: Spatial Distribution Analysis
Pillar 3: Statistical Characterization

These pillars help us understand sampling density, identify spatial patterns, and characterize grade distributions - all critical for informed estimation decisions.

Setup and Data Loading

Kode

library(dplyr)
library(tidyr)
library(ggplot2)
library(plotly)
library(DT)
library(RColorBrewer)
library(patchwork)

# Create simulated drilling data
set.seed(123)
n_holes <- 50

collar <- data.frame(
  hole_id = paste0("DDH", sprintf("%03d", 1:n_holes)),
  x = runif(n_holes, 500000, 501000),
  y = runif(n_holes, 9000000, 9001000),
  rl = runif(n_holes, 100, 200)
)

# Assay data with spatial correlation
assay_list <- lapply(1:n_holes, function(i) {
  n_intervals <- sample(15:25, 1)
  depths <- seq(0, by = 2, length.out = n_intervals)
  
  # Add spatial correlation to grades
  distance_from_center <- sqrt((collar$x[i] - 500500)^2 + (collar$y[i] - 9000500)^2)
  grade_factor <- exp(-distance_from_center / 300)
  
  data.frame(
    hole_id = collar$hole_id[i],
    from = depths[-length(depths)],
    to = depths[-1],
    au_ppm = pmax(0, rnorm(n_intervals - 1, mean = 1.5 * grade_factor, sd = 2)),
    ag_ppm = pmax(0, rnorm(n_intervals - 1, mean = 15 * grade_factor, sd = 20)),
    cu_pct = pmax(0, rnorm(n_intervals - 1, mean = 0.5 * grade_factor, sd = 0.8))
  )
})
assay <- do.call(rbind, assay_list)

# Lithology data
litho_codes <- c("Andesite", "Diorite", "Mineralized_Zone", "Altered_Volcanics")
lithology_list <- lapply(collar$hole_id, function(hid) {
  n_litho <- sample(4:8, 1)
  depths <- sort(c(0, sample(5:40, n_litho - 1), 50))
  
  data.frame(
    hole_id = hid,
    from = depths[-length(depths)],
    to = depths[-1],
    lithology = sample(litho_codes, n_litho, replace = TRUE, 
                      prob = c(0.3, 0.2, 0.3, 0.2))
  )
})
lithology <- do.call(rbind, lithology_list)

# Clean and prepare data
collar <- collar %>% janitor::clean_names() %>%
  select(hole_id, x, y, z = rl) %>%
  mutate(hole_id = as.character(hole_id))

assay <- assay %>% janitor::clean_names() %>%
  mutate(hole_id = as.character(hole_id))

lithology <- lithology %>% janitor::clean_names() %>%
  select(hole_id, from, to, lithology) %>%
  mutate(hole_id = as.character(hole_id))

# Merge datasets
collar_std <- collar
assay_std <- assay
lithology_std <- lithology

combined_data <- assay_std %>%
  left_join(collar_std, by = "hole_id") %>%
  mutate(mid_point = from + (to - from) / 2) %>%
  left_join(
    lithology_std %>% rename(litho_from = from, litho_to = to),
    by = join_by(hole_id, between(mid_point, litho_from, litho_to))
  ) %>%
  select(-mid_point, -litho_from, -litho_to)

PILLAR 2: Spatial Distribution Analysis

Objective

Understand where your data is located in 3D space:

Are drillholes evenly distributed?
Where are the dense vs sparse areas?
What are the initial mineralization trends?
Is there clustering or bias in sampling?

Why Spatial Analysis Matters

Spatial patterns directly impact:

Kriging neighborhood selection
Search ellipse orientation
Estimation variance
Classification confidence (Measured, Indicated, Inferred)

2D Drillhole Location Map

Let’s start with a plan view showing collar locations.

Basic Collar Map

Kode

# Create 2D plan view
p_2d <- ggplot(collar_std, aes(x = x, y = y)) +
  geom_point(size = 3, alpha = 0.7, color = "steelblue") +
  geom_text(aes(label = hole_id), 
            hjust = -0.2, vjust = 0.5, 
            size = 2.5, 
            check_overlap = TRUE) +
  coord_equal() +
  labs(
    title = "2D Drillhole Plan View",
    subtitle = paste(nrow(collar_std), "drillholes"),
    x = "Easting (m)",
    y = "Northing (m)"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    panel.grid.minor = element_blank()
  )

p_2d

Spatial Pattern Observations

Look for:

Clustering: Groups of closely spaced holes
Gaps: Areas with no drilling
Grid pattern: Regular vs irregular spacing
Directional bias: Preferential drilling directions

Interactive 2D Map with Average Grades

Kode

# Calculate average grade per hole (example: using au_ppm)
avg_grades <- combined_data %>%
  group_by(hole_id, x, y) %>%
  summarise(
    avg_au = mean(au_ppm, na.rm = TRUE),
    max_au = max(au_ppm, na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  filter(!is.na(avg_au))

# Create interactive plotly map
plot_ly(data = avg_grades,
        x = ~x, y = ~y,
        type = 'scatter', mode = 'markers',
        marker = list(
          size = 10,
          color = ~avg_au,
          colorscale = 'Viridis',
          showscale = TRUE,
          colorbar = list(title = "Avg Au<br>(ppm)")
        ),
        text = ~paste0(
          "Hole: ", hole_id, "<br>",
          "Avg Au: ", round(avg_au, 3), " ppm<br>",
          "Max Au: ", round(max_au, 3), " ppm"
        ),
        hoverinfo = 'text') %>%
  layout(
    title = "2D Drillhole Map: Average Gold Grades",
    xaxis = list(title = "Easting (m)"),
    yaxis = list(
      title = "Northing (m)",
      scaleanchor = "x",
      scaleratio = 1
    )
  )

Sampling Density Analysis

Kode

# Calculate local sampling density
density_estimate <- MASS::kde2d(
  collar_std$x, 
  collar_std$y, 
  n = 50
)

# Convert to data frame for ggplot
density_df <- expand.grid(
  x = density_estimate$x,
  y = density_estimate$y
) %>%
  mutate(density = as.vector(density_estimate$z))

p_density <- ggplot() +
  geom_tile(data = density_df, aes(x = x, y = y, fill = density)) +
  scale_fill_viridis_c(option = "plasma", name = "Density") +
  geom_point(data = collar_std, aes(x = x, y = y), 
             size = 2, color = "white", alpha = 0.6) +
  coord_equal() +
  labs(
    title = "Sampling Density Heatmap",
    subtitle = "Darker areas indicate higher drillhole density",
    x = "Easting (m)",
    y = "Northing (m)"
  ) +
  theme_minimal()

p_density

Classification Implications

Sampling density directly impacts resource classification:

Dense areas: Higher confidence → Measured/Indicated
Sparse areas: Lower confidence → Indicated/Inferred
No data areas: Extrapolation risk

Composite Length Analysis

Understanding sample support is critical for geostatistics.

Kode

# Calculate composite lengths
combined_data <- combined_data %>%
  mutate(composite_length = to - from)

# Summary statistics
length_summary <- combined_data %>%
  summarise(
    `Min Length` = min(composite_length, na.rm = TRUE),
    `Q1` = quantile(composite_length, 0.25, na.rm = TRUE),
    `Median` = median(composite_length, na.rm = TRUE),
    `Mean` = mean(composite_length, na.rm = TRUE),
    `Q3` = quantile(composite_length, 0.75, na.rm = TRUE),
    `Max Length` = max(composite_length, na.rm = TRUE),
    `Std Dev` = sd(composite_length, na.rm = TRUE)
  )

datatable(length_summary,
          caption = "Table 1: Composite Length Statistics (meters)",
          options = list(dom = 't')) %>%
  formatRound(columns = 1:7, digits = 2)

Kode

# Histogram
p_length <- ggplot(combined_data, aes(x = composite_length)) +
  geom_histogram(bins = 30, fill = "skyblue", color = "black", alpha = 0.7) +
  geom_vline(xintercept = mean(combined_data$composite_length, na.rm = TRUE),
             color = "red", linetype = "dashed", size = 1) +
  annotate("text", 
           x = mean(combined_data$composite_length, na.rm = TRUE),
           y = Inf,
           label = paste("Mean:", round(mean(combined_data$composite_length, na.rm = TRUE), 2), "m"),
           vjust = 2, color = "red", fontface = "bold") +
  labs(
    title = "Composite Length Distribution",
    subtitle = "Check for consistency in sample support",
    x = "Composite Length (m)",
    y = "Frequency"
  ) +
  theme_minimal()

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Kode

p_length

Variable Sample Support Issues

Inconsistent sample lengths can cause:

Support effect: Grade variance differences
Bias in estimation: Longer samples overweighted
Variogram artifacts: False spatial continuity

Solution: Composite to a consistent length (typically 1m or 2m)

3D Visualization

Understanding the vertical distribution of data is essential.

Kode

# Prepare 3D data with depth
data_3d <- combined_data %>%
  mutate(z_sample = z - ((from + to) / 2)) %>%
  filter(!is.na(x) & !is.na(y) & !is.na(z_sample) & !is.na(au_ppm))

# Sample data for performance (use 50% of data)
set.seed(123)
data_3d_sample <- sample_frac(data_3d, 0.5)

# Create 3D scatter plot
plot_ly(
  data = data_3d_sample,
  x = ~x, y = ~y, z = ~z_sample,
  type = 'scatter3d', mode = 'markers',
  marker = list(
    color = ~au_ppm,
    colorscale = 'Viridis',
    colorbar = list(title = "Au (ppm)"),
    size = 3
  ),
  text = ~paste0(
    "Hole: ", hole_id, "<br>",
    "Depth: ", round(from, 1), "-", round(to, 1), "m<br>",
    "Au: ", round(au_ppm, 3), " ppm"
  ),
  hoverinfo = 'text'
) %>%
  layout(
    title = "3D Drillhole Assay View",
    scene = list(
      xaxis = list(title = "Easting"),
      yaxis = list(title = "Northing"),
      zaxis = list(title = "Elevation (RL)", autorange = "reversed"),
      aspectratio = list(x = 1, y = 1, z = 0.5)
    )
  )

3D Visualization Benefits

3D views help identify:

Vertical trends in mineralization
Dip and plunge of ore bodies
Drilling depth coverage
Spatial clustering in 3D space

PILLAR 3: Statistical Characterization

Objective

Understand the “personality” of your grade data:

What does the distribution look like?
Are there outliers?
Single or multiple populations?
What top-cut value is appropriate?

Univariate Statistics

Summary Statistics by Grade Variable

Kode

# Select numeric grade columns
grade_cols <- c("au_ppm", "ag_ppm", "cu_pct")

# Calculate comprehensive statistics
stats_summary <- combined_data %>%
  select(all_of(grade_cols)) %>%
  pivot_longer(everything(), names_to = "Grade", values_to = "Value") %>%
  group_by(Grade) %>%
  summarise(
    Count = sum(!is.na(Value)),
    Mean = mean(Value, na.rm = TRUE),
    Median = median(Value, na.rm = TRUE),
    `Std Dev` = sd(Value, na.rm = TRUE),
    CV = sd(Value, na.rm = TRUE) / mean(Value, na.rm = TRUE),
    Min = min(Value, na.rm = TRUE),
    Q1 = quantile(Value, 0.25, na.rm = TRUE),
    Q3 = quantile(Value, 0.75, na.rm = TRUE),
    Max = max(Value, na.rm = TRUE),
    Skewness = e1071::skewness(Value, na.rm = TRUE)
  ) %>%
  mutate(across(where(is.numeric), ~round(.x, 3)))

datatable(stats_summary,
          caption = "Table 2: Comprehensive Grade Statistics",
          options = list(scrollX = TRUE, pageLength = 10))

Key Statistical Indicators

CV (Coefficient of Variation): Measure of variability (CV > 1 indicates high variability)
Skewness: Positive skew common in grade data (long right tail)
Mean vs Median: Large differences suggest outliers

Grade Distribution Analysis

Histograms

Kode

# Create histograms for each grade
plot_list <- lapply(grade_cols, function(col) {
  ggplot(combined_data, aes(x = .data[[col]])) +
    geom_histogram(bins = 50, fill = "skyblue", color = "black", alpha = 0.7) +
    geom_vline(xintercept = mean(combined_data[[col]], na.rm = TRUE),
               color = "red", linetype = "dashed", size = 1) +
    geom_vline(xintercept = median(combined_data[[col]], na.rm = TRUE),
               color = "blue", linetype = "dashed", size = 1) +
    labs(
      title = paste("Distribution of", col),
      x = col,
      y = "Frequency"
    ) +
    theme_minimal() +
    theme(plot.title = element_text(size = 10))
})

# Combine plots
(plot_list[[1]] | plot_list[[2]]) / plot_list[[3]]

Log-Normal Probability Plots

Essential for identifying populations and assessing normality.

Kode

# Create Q-Q plots for log-transformed data
qq_plots <- lapply(grade_cols, function(col) {
  data_subset <- combined_data %>%
    filter(!is.na(.data[[col]]) & .data[[col]] > 0)
  
  ggplot(data_subset, aes(sample = log(.data[[col]]))) +
    stat_qq(color = "steelblue", alpha = 0.5) +
    stat_qq_line(color = "red", size = 1) +
    labs(
      title = paste("Log-Normal Q-Q Plot:", col),
      x = "Theoretical Quantiles",
      y = "Sample Quantiles (log scale)"
    ) +
    theme_minimal()
})

(qq_plots[[1]] | qq_plots[[2]]) / qq_plots[[3]]

Population Identification

Breaks or changes in slope on Q-Q plots indicate:

Multiple populations: Different geological/mineralization domains
Outliers: High-grade samples requiring investigation
Mixed distributions: Need for domain separation

Action: Use this information for domaining decisions!

Outlier Detection and Analysis

Boxplots by Lithology

Kode

# Focus on Au for outlier analysis
p_box <- ggplot(combined_data, aes(x = lithology, y = au_ppm, fill = lithology)) +
  geom_boxplot(outlier.colour = "red", outlier.shape = 16, outlier.size = 2) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Gold Grade Distribution by Lithology",
    subtitle = "Red points indicate statistical outliers (>1.5 IQR)",
    x = "Lithology",
    y = "Au (ppm)"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "none"
  )

p_box

Outlier Table

Kode

# Calculate IQR-based outliers
outlier_threshold <- combined_data %>%
  summarise(
    Q1 = quantile(au_ppm, 0.25, na.rm = TRUE),
    Q3 = quantile(au_ppm, 0.75, na.rm = TRUE)
  ) %>%
  mutate(
    IQR = Q3 - Q1,
    lower_bound = Q1 - 1.5 * IQR,
    upper_bound = Q3 + 1.5 * IQR
  )

outliers <- combined_data %>%
  filter(au_ppm > outlier_threshold$upper_bound | 
         au_ppm < outlier_threshold$lower_bound) %>%
  select(hole_id, from, to, lithology, au_ppm) %>%
  arrange(desc(au_ppm))

datatable(outliers,
          caption = paste("Table 3: Outlier Samples (n =", nrow(outliers), ")"),
          options = list(pageLength = 10, scrollX = TRUE)) %>%
  formatStyle('au_ppm', backgroundColor = '#ffebee', fontWeight = 'bold')

Outlier Treatment Options

Keep as-is: If geologically justified
Top-cut (cap): Limit maximum grade
Remove: If analytical errors suspected
Investigate: Check for sample preparation issues

Never remove outliers without geological justification!

Top-Cut Analysis

Determining appropriate grade capping values.

Kode

# Calculate percentiles
percentiles <- seq(0.9, 1.0, by = 0.01)
percentile_values <- quantile(combined_data$au_ppm, probs = percentiles, na.rm = TRUE)

percentile_df <- data.frame(
  Percentile = percentiles * 100,
  Grade = percentile_values
)

# Plot grade vs percentile
p_percentile <- ggplot(percentile_df, aes(x = Percentile, y = Grade)) +
  geom_line(color = "steelblue", size = 1.5) +
  geom_point(size = 2, color = "darkblue") +
  geom_vline(xintercept = c(95, 97.5, 99), 
             linetype = "dashed", 
             color = c("orange", "red", "darkred")) +
  annotate("text", x = 95, y = max(percentile_values) * 0.9, 
           label = "P95", color = "orange") +
  annotate("text", x = 97.5, y = max(percentile_values) * 0.9, 
           label = "P97.5", color = "red") +
  annotate("text", x = 99, y = max(percentile_values) * 0.9, 
           label = "P99", color = "darkred") +
  labs(
    title = "Top-Cut Analysis: Grade vs Percentile",
    subtitle = "Common thresholds: P95, P97.5, P99",
    x = "Percentile (%)",
    y = "Au Grade (ppm)"
  ) +
  theme_minimal()

p_percentile

Impact of Top-Cutting

Kode

# Compare statistics with different top-cuts
top_cuts <- c(Inf, 
              quantile(combined_data$au_ppm, 0.95, na.rm = TRUE),
              quantile(combined_data$au_ppm, 0.975, na.rm = TRUE),
              quantile(combined_data$au_ppm, 0.99, na.rm = TRUE))

impact_df <- data.frame(
  Scenario = c("No Top-Cut", "P95 Cut", "P97.5 Cut", "P99 Cut"),
  `Cut Value` = round(top_cuts, 2),
  Mean = sapply(top_cuts, function(tc) {
    mean(pmin(combined_data$au_ppm, tc), na.rm = TRUE)
  }),
  Median = sapply(top_cuts, function(tc) {
    median(pmin(combined_data$au_ppm, tc), na.rm = TRUE)
  }),
  `Std Dev` = sapply(top_cuts, function(tc) {
    sd(pmin(combined_data$au_ppm, tc), na.rm = TRUE)
  }),
  CV = sapply(top_cuts, function(tc) {
    sd(pmin(combined_data$au_ppm, tc), na.rm = TRUE) / 
      mean(pmin(combined_data$au_ppm, tc), na.rm = TRUE)
  })
) %>%
  mutate(across(where(is.numeric), ~round(.x, 3)))

datatable(impact_df,
          caption = "Table 4: Impact of Top-Cutting on Statistics",
          options = list(dom = 't'))

Selecting Top-Cut Values

Consider:

Statistical analysis: Probability plots, percentiles
Geological context: Are high grades real or analytical errors?
Impact on resources: Balance between grade and tonnage
Industry standards: P95-P99 common for precious metals
Domaining: Different top-cuts for different domains

Document your decision with geological and statistical justification!

Grade Distribution by Lithology

Understanding how grades vary by rock type.

Kode

# Faceted histograms by lithology
ggplot(combined_data, aes(x = au_ppm)) +
  geom_histogram(bins = 30, fill = "skyblue", color = "black", alpha = 0.7) +
  facet_wrap(~ lithology, scales = "free_y", ncol = 2) +
  labs(
    title = "Gold Grade Distribution by Lithology",
    x = "Au (ppm)",
    y = "Frequency"
  ) +
  theme_minimal() +
  theme(strip.text = element_text(face = "bold", size = 10))

Statistical Comparison by Lithology

Kode

litho_stats <- combined_data %>%
  group_by(lithology) %>%
  summarise(
    Count = n(),
    Mean = mean(au_ppm, na.rm = TRUE),
    Median = median(au_ppm, na.rm = TRUE),
    `Std Dev` = sd(au_ppm, na.rm = TRUE),
    CV = sd(au_ppm, na.rm = TRUE) / mean(au_ppm, na.rm = TRUE),
    Max = max(au_ppm, na.rm = TRUE)
  ) %>%
  arrange(desc(Mean)) %>%
  mutate(across(where(is.numeric) & !Count, ~round(.x, 3)))

datatable(litho_stats,
          caption = "Table 5: Grade Statistics by Lithology",
          options = list(pageLength = 10)) %>%
  formatStyle(
    'Mean',
    background = styleColorBar(litho_stats$Mean, 'lightblue'),
    backgroundSize = '100% 90%',
    backgroundRepeat = 'no-repeat',
    backgroundPosition = 'center'
  )

Geological Insights

This analysis reveals:

Host rocks: Which lithologies contain mineralization?
Barren zones: Low-grade lithologies to exclude?
Grade differences: Justification for separate domains?
Continuity: Are grades consistent within lithology?

Summary: Pillars 2 & 3

Key Achievements

Spatial Analysis (Pillar 2): ✓ Visualized drillhole distribution in 2D and 3D ✓ Identified sampling density patterns ✓ Analyzed composite length consistency ✓ Recognized spatial trends in mineralization

Statistical Analysis (Pillar 3): ✓ Characterized grade distributions ✓ Identified outliers and proposed top-cuts ✓ Compared statistics by lithology ✓ Assessed data populations

Checklist: Before Moving to Pillar 4

Spatial patterns understood and documented
Sampling density adequate for classification
Grade populations identified
Outliers investigated and treatment decided
Top-cut values proposed with justification
Lithology-grade relationships documented

Ready for Geological Controls?

With spatial and statistical understanding complete, you’re ready to connect these patterns to geology in Part 3: Geological Controls & Domain Definition.

Best Practices Recap

Common Pitfalls to Avoid

Ignoring spatial patterns - Statistics alone aren’t enough
Over-relying on automation - Always apply geological thinking
Arbitrary top-cutting - Document geological justification
Skipping lithology analysis - Rock types control grades

Documentation Requirements

For JORC compliance, document:

Spatial distribution maps and density analysis
Statistical summary by domain/lithology
Outlier treatment decisions with justification
Top-cut analysis and selected values
Population identification methodology

Previous: ← Part 1 - Data Validation

Next: Part 3 - Geological Controls & Domain Definition →

--- title: "Part 2: Spatial & Statistical Analysis" subtitle: "Understanding Distribution Patterns and Grade Characteristics" author: "Ghozian Islam Karami" date: "2025-10-03" categories: [EDA, Spatial Analysis, Statistics, Visualization] image: "Part2.png" format: html: code-fold: true code-tools: true toc: true toc-depth: 3 --- ## Introduction With validated data from [Part 1](/posts/Data Validation/), we now explore **where** the data is located and **what** it tells us statistically. This post covers: - **Pillar 2**: Spatial Distribution Analysis - **Pillar 3**: Statistical Characterization These pillars help us understand sampling density, identify spatial patterns, and characterize grade distributions - all critical for informed estimation decisions. ## Setup and Data Loading ```{r} #| message: false #| warning: false library(dplyr) library(tidyr) library(ggplot2) library(plotly) library(DT) library(RColorBrewer) library(patchwork) # Create simulated drilling data set.seed(123) n_holes <- 50 collar <- data.frame( hole_id = paste0("DDH", sprintf("%03d", 1:n_holes)), x = runif(n_holes, 500000, 501000), y = runif(n_holes, 9000000, 9001000), rl = runif(n_holes, 100, 200) ) # Assay data with spatial correlation assay_list <- lapply(1:n_holes, function(i) { n_intervals <- sample(15:25, 1) depths <- seq(0, by = 2, length.out = n_intervals) # Add spatial correlation to grades distance_from_center <- sqrt((collar$x[i] - 500500)^2 + (collar$y[i] - 9000500)^2) grade_factor <- exp(-distance_from_center / 300) data.frame( hole_id = collar$hole_id[i], from = depths[-length(depths)], to = depths[-1], au_ppm = pmax(0, rnorm(n_intervals - 1, mean = 1.5 * grade_factor, sd = 2)), ag_ppm = pmax(0, rnorm(n_intervals - 1, mean = 15 * grade_factor, sd = 20)), cu_pct = pmax(0, rnorm(n_intervals - 1, mean = 0.5 * grade_factor, sd = 0.8)) ) }) assay <- do.call(rbind, assay_list) # Lithology data litho_codes <- c("Andesite", "Diorite", "Mineralized_Zone", "Altered_Volcanics") lithology_list <- lapply(collar$hole_id, function(hid) { n_litho <- sample(4:8, 1) depths <- sort(c(0, sample(5:40, n_litho - 1), 50)) data.frame( hole_id = hid, from = depths[-length(depths)], to = depths[-1], lithology = sample(litho_codes, n_litho, replace = TRUE, prob = c(0.3, 0.2, 0.3, 0.2)) ) }) lithology <- do.call(rbind, lithology_list) # Clean and prepare data collar <- collar %>% janitor::clean_names() %>% select(hole_id, x, y, z = rl) %>% mutate(hole_id = as.character(hole_id)) assay <- assay %>% janitor::clean_names() %>% mutate(hole_id = as.character(hole_id)) lithology <- lithology %>% janitor::clean_names() %>% select(hole_id, from, to, lithology) %>% mutate(hole_id = as.character(hole_id)) # Merge datasets collar_std <- collar assay_std <- assay lithology_std <- lithology combined_data <- assay_std %>% left_join(collar_std, by = "hole_id") %>% mutate(mid_point = from + (to - from) / 2) %>% left_join( lithology_std %>% rename(litho_from = from, litho_to = to), by = join_by(hole_id, between(mid_point, litho_from, litho_to)) ) %>% select(-mid_point, -litho_from, -litho_to) ``` --- # PILLAR 2: Spatial Distribution Analysis ## Objective Understand **where** your data is located in 3D space: - Are drillholes evenly distributed? - Where are the dense vs sparse areas? - What are the initial mineralization trends? - Is there clustering or bias in sampling? ::: {.callout-tip} ## Why Spatial Analysis Matters Spatial patterns directly impact: - Kriging neighborhood selection - Search ellipse orientation - Estimation variance - Classification confidence (Measured, Indicated, Inferred) ::: ## 2D Drillhole Location Map Let's start with a plan view showing collar locations. ### Basic Collar Map ```{r} #| fig-width: 10 #| fig-height: 8 # Create 2D plan view p_2d <- ggplot(collar_std, aes(x = x, y = y)) + geom_point(size = 3, alpha = 0.7, color = "steelblue") + geom_text(aes(label = hole_id), hjust = -0.2, vjust = 0.5, size = 2.5, check_overlap = TRUE) + coord_equal() + labs( title = "2D Drillhole Plan View", subtitle = paste(nrow(collar_std), "drillholes"), x = "Easting (m)", y = "Northing (m)" ) + theme_minimal() + theme( plot.title = element_text(size = 14, face = "bold"), panel.grid.minor = element_blank() ) p_2d ``` ::: {.callout-note} ## Spatial Pattern Observations Look for: - **Clustering**: Groups of closely spaced holes - **Gaps**: Areas with no drilling - **Grid pattern**: Regular vs irregular spacing - **Directional bias**: Preferential drilling directions ::: ### Interactive 2D Map with Average Grades ```{r} #| fig-width: 10 #| fig-height: 8 # Calculate average grade per hole (example: using au_ppm) avg_grades <- combined_data %>% group_by(hole_id, x, y) %>% summarise( avg_au = mean(au_ppm, na.rm = TRUE), max_au = max(au_ppm, na.rm = TRUE), .groups = 'drop' ) %>% filter(!is.na(avg_au)) # Create interactive plotly map plot_ly(data = avg_grades, x = ~x, y = ~y, type = 'scatter', mode = 'markers', marker = list( size = 10, color = ~avg_au, colorscale = 'Viridis', showscale = TRUE, colorbar = list(title = "Avg Au (ppm)") ), text = ~paste0( "Hole: ", hole_id, " ", "Avg Au: ", round(avg_au, 3), " ppm ", "Max Au: ", round(max_au, 3), " ppm" ), hoverinfo = 'text') %>% layout( title = "2D Drillhole Map: Average Gold Grades", xaxis = list(title = "Easting (m)"), yaxis = list( title = "Northing (m)", scaleanchor = "x", scaleratio = 1 ) ) ``` ### Sampling Density Analysis ```{r} #| fig-width: 10 #| fig-height: 4 # Calculate local sampling density density_estimate <- MASS::kde2d( collar_std$x, collar_std$y, n = 50 ) # Convert to data frame for ggplot density_df <- expand.grid( x = density_estimate$x, y = density_estimate$y ) %>% mutate(density = as.vector(density_estimate$z)) p_density <- ggplot() + geom_tile(data = density_df, aes(x = x, y = y, fill = density)) + scale_fill_viridis_c(option = "plasma", name = "Density") + geom_point(data = collar_std, aes(x = x, y = y), size = 2, color = "white", alpha = 0.6) + coord_equal() + labs( title = "Sampling Density Heatmap", subtitle = "Darker areas indicate higher drillhole density", x = "Easting (m)", y = "Northing (m)" ) + theme_minimal() p_density ``` ::: {.callout-important} ## Classification Implications Sampling density directly impacts resource classification: - **Dense areas**: Higher confidence → Measured/Indicated - **Sparse areas**: Lower confidence → Indicated/Inferred - **No data areas**: Extrapolation risk ::: ## Composite Length Analysis Understanding sample support is critical for geostatistics. ```{r} #| fig-width: 10 #| fig-height: 5 # Calculate composite lengths combined_data <- combined_data %>% mutate(composite_length = to - from) # Summary statistics length_summary <- combined_data %>% summarise( `Min Length` = min(composite_length, na.rm = TRUE), `Q1` = quantile(composite_length, 0.25, na.rm = TRUE), `Median` = median(composite_length, na.rm = TRUE), `Mean` = mean(composite_length, na.rm = TRUE), `Q3` = quantile(composite_length, 0.75, na.rm = TRUE), `Max Length` = max(composite_length, na.rm = TRUE), `Std Dev` = sd(composite_length, na.rm = TRUE) ) datatable(length_summary, caption = "Table 1: Composite Length Statistics (meters)", options = list(dom = 't')) %>% formatRound(columns = 1:7, digits = 2) # Histogram p_length <- ggplot(combined_data, aes(x = composite_length)) + geom_histogram(bins = 30, fill = "skyblue", color = "black", alpha = 0.7) + geom_vline(xintercept = mean(combined_data$composite_length, na.rm = TRUE), color = "red", linetype = "dashed", size = 1) + annotate("text", x = mean(combined_data$composite_length, na.rm = TRUE), y = Inf, label = paste("Mean:", round(mean(combined_data$composite_length, na.rm = TRUE), 2), "m"), vjust = 2, color = "red", fontface = "bold") + labs( title = "Composite Length Distribution", subtitle = "Check for consistency in sample support", x = "Composite Length (m)", y = "Frequency" ) + theme_minimal() p_length ``` ::: {.callout-warning} ## Variable Sample Support Issues Inconsistent sample lengths can cause: - **Support effect**: Grade variance differences - **Bias in estimation**: Longer samples overweighted - **Variogram artifacts**: False spatial continuity **Solution**: Composite to a consistent length (typically 1m or 2m) ::: ## 3D Visualization Understanding the vertical distribution of data is essential. ```{r} #| fig-width: 10 #| fig-height: 8 # Prepare 3D data with depth data_3d <- combined_data %>% mutate(z_sample = z - ((from + to) / 2)) %>% filter(!is.na(x) & !is.na(y) & !is.na(z_sample) & !is.na(au_ppm)) # Sample data for performance (use 50% of data) set.seed(123) data_3d_sample <- sample_frac(data_3d, 0.5) # Create 3D scatter plot plot_ly( data = data_3d_sample, x = ~x, y = ~y, z = ~z_sample, type = 'scatter3d', mode = 'markers', marker = list( color = ~au_ppm, colorscale = 'Viridis', colorbar = list(title = "Au (ppm)"), size = 3 ), text = ~paste0( "Hole: ", hole_id, " ", "Depth: ", round(from, 1), "-", round(to, 1), "m ", "Au: ", round(au_ppm, 3), " ppm" ), hoverinfo = 'text' ) %>% layout( title = "3D Drillhole Assay View", scene = list( xaxis = list(title = "Easting"), yaxis = list(title = "Northing"), zaxis = list(title = "Elevation (RL)", autorange = "reversed"), aspectratio = list(x = 1, y = 1, z = 0.5) ) ) ``` ::: {.callout-tip} ## 3D Visualization Benefits 3D views help identify: - Vertical trends in mineralization - Dip and plunge of ore bodies - Drilling depth coverage - Spatial clustering in 3D space ::: --- # PILLAR 3: Statistical Characterization ## Objective Understand the **"personality"** of your grade data: - What does the distribution look like? - Are there outliers? - Single or multiple populations? - What top-cut value is appropriate? ## Univariate Statistics ### Summary Statistics by Grade Variable ```{r} # Select numeric grade columns grade_cols <- c("au_ppm", "ag_ppm", "cu_pct") # Calculate comprehensive statistics stats_summary <- combined_data %>% select(all_of(grade_cols)) %>% pivot_longer(everything(), names_to = "Grade", values_to = "Value") %>% group_by(Grade) %>% summarise( Count = sum(!is.na(Value)), Mean = mean(Value, na.rm = TRUE), Median = median(Value, na.rm = TRUE), `Std Dev` = sd(Value, na.rm = TRUE), CV = sd(Value, na.rm = TRUE) / mean(Value, na.rm = TRUE), Min = min(Value, na.rm = TRUE), Q1 = quantile(Value, 0.25, na.rm = TRUE), Q3 = quantile(Value, 0.75, na.rm = TRUE), Max = max(Value, na.rm = TRUE), Skewness = e1071::skewness(Value, na.rm = TRUE) ) %>% mutate(across(where(is.numeric), ~round(.x, 3))) datatable(stats_summary, caption = "Table 2: Comprehensive Grade Statistics", options = list(scrollX = TRUE, pageLength = 10)) ``` ::: {.callout-note} ## Key Statistical Indicators - **CV (Coefficient of Variation)**: Measure of variability (CV > 1 indicates high variability) - **Skewness**: Positive skew common in grade data (long right tail) - **Mean vs Median**: Large differences suggest outliers ::: ## Grade Distribution Analysis ### Histograms ```{r} #| fig-width: 10 #| fig-height: 8 # Create histograms for each grade plot_list <- lapply(grade_cols, function(col) { ggplot(combined_data, aes(x = .data[[col]])) + geom_histogram(bins = 50, fill = "skyblue", color = "black", alpha = 0.7) + geom_vline(xintercept = mean(combined_data[[col]], na.rm = TRUE), color = "red", linetype = "dashed", size = 1) + geom_vline(xintercept = median(combined_data[[col]], na.rm = TRUE), color = "blue", linetype = "dashed", size = 1) + labs( title = paste("Distribution of", col), x = col, y = "Frequency" ) + theme_minimal() + theme(plot.title = element_text(size = 10)) }) # Combine plots (plot_list[[1]] | plot_list[[2]]) / plot_list[[3]] ``` ### Log-Normal Probability Plots Essential for identifying populations and assessing normality. ```{r} #| fig-width: 10 #| fig-height: 6 # Create Q-Q plots for log-transformed data qq_plots <- lapply(grade_cols, function(col) { data_subset <- combined_data %>% filter(!is.na(.data[[col]]) & .data[[col]] > 0) ggplot(data_subset, aes(sample = log(.data[[col]]))) + stat_qq(color = "steelblue", alpha = 0.5) + stat_qq_line(color = "red", size = 1) + labs( title = paste("Log-Normal Q-Q Plot:", col), x = "Theoretical Quantiles", y = "Sample Quantiles (log scale)" ) + theme_minimal() }) (qq_plots[[1]] | qq_plots[[2]]) / qq_plots[[3]] ``` ::: {.callout-important} ## Population Identification Breaks or changes in slope on Q-Q plots indicate: - **Multiple populations**: Different geological/mineralization domains - **Outliers**: High-grade samples requiring investigation - **Mixed distributions**: Need for domain separation **Action**: Use this information for domaining decisions! ::: ## Outlier Detection and Analysis ### Boxplots by Lithology ```{r} #| fig-width: 10 #| fig-height: 6 # Focus on Au for outlier analysis p_box <- ggplot(combined_data, aes(x = lithology, y = au_ppm, fill = lithology)) + geom_boxplot(outlier.colour = "red", outlier.shape = 16, outlier.size = 2) + scale_fill_brewer(palette = "Set2") + labs( title = "Gold Grade Distribution by Lithology", subtitle = "Red points indicate statistical outliers (>1.5 IQR)", x = "Lithology", y = "Au (ppm)" ) + theme_minimal() + theme( axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "none" ) p_box ``` ### Outlier Table ```{r} # Calculate IQR-based outliers outlier_threshold <- combined_data %>% summarise( Q1 = quantile(au_ppm, 0.25, na.rm = TRUE), Q3 = quantile(au_ppm, 0.75, na.rm = TRUE) ) %>% mutate( IQR = Q3 - Q1, lower_bound = Q1 - 1.5 * IQR, upper_bound = Q3 + 1.5 * IQR ) outliers <- combined_data %>% filter(au_ppm > outlier_threshold$upper_bound | au_ppm < outlier_threshold$lower_bound) %>% select(hole_id, from, to, lithology, au_ppm) %>% arrange(desc(au_ppm)) datatable(outliers, caption = paste("Table 3: Outlier Samples (n =", nrow(outliers), ")"), options = list(pageLength = 10, scrollX = TRUE)) %>% formatStyle('au_ppm', backgroundColor = '#ffebee', fontWeight = 'bold') ``` ::: {.callout-warning} ## Outlier Treatment Options 1. **Keep as-is**: If geologically justified 2. **Top-cut (cap)**: Limit maximum grade 3. **Remove**: If analytical errors suspected 4. **Investigate**: Check for sample preparation issues **Never** remove outliers without geological justification! ::: ## Top-Cut Analysis Determining appropriate grade capping values. ```{r} #| fig-width: 10 #| fig-height: 5 # Calculate percentiles percentiles <- seq(0.9, 1.0, by = 0.01) percentile_values <- quantile(combined_data$au_ppm, probs = percentiles, na.rm = TRUE) percentile_df <- data.frame( Percentile = percentiles * 100, Grade = percentile_values ) # Plot grade vs percentile p_percentile <- ggplot(percentile_df, aes(x = Percentile, y = Grade)) + geom_line(color = "steelblue", size = 1.5) + geom_point(size = 2, color = "darkblue") + geom_vline(xintercept = c(95, 97.5, 99), linetype = "dashed", color = c("orange", "red", "darkred")) + annotate("text", x = 95, y = max(percentile_values) * 0.9, label = "P95", color = "orange") + annotate("text", x = 97.5, y = max(percentile_values) * 0.9, label = "P97.5", color = "red") + annotate("text", x = 99, y = max(percentile_values) * 0.9, label = "P99", color = "darkred") + labs( title = "Top-Cut Analysis: Grade vs Percentile", subtitle = "Common thresholds: P95, P97.5, P99", x = "Percentile (%)", y = "Au Grade (ppm)" ) + theme_minimal() p_percentile ``` ### Impact of Top-Cutting ```{r} # Compare statistics with different top-cuts top_cuts <- c(Inf, quantile(combined_data$au_ppm, 0.95, na.rm = TRUE), quantile(combined_data$au_ppm, 0.975, na.rm = TRUE), quantile(combined_data$au_ppm, 0.99, na.rm = TRUE)) impact_df <- data.frame( Scenario = c("No Top-Cut", "P95 Cut", "P97.5 Cut", "P99 Cut"), `Cut Value` = round(top_cuts, 2), Mean = sapply(top_cuts, function(tc) { mean(pmin(combined_data$au_ppm, tc), na.rm = TRUE) }), Median = sapply(top_cuts, function(tc) { median(pmin(combined_data$au_ppm, tc), na.rm = TRUE) }), `Std Dev` = sapply(top_cuts, function(tc) { sd(pmin(combined_data$au_ppm, tc), na.rm = TRUE) }), CV = sapply(top_cuts, function(tc) { sd(pmin(combined_data$au_ppm, tc), na.rm = TRUE) / mean(pmin(combined_data$au_ppm, tc), na.rm = TRUE) }) ) %>% mutate(across(where(is.numeric), ~round(.x, 3))) datatable(impact_df, caption = "Table 4: Impact of Top-Cutting on Statistics", options = list(dom = 't')) ``` ::: {.callout-tip} ## Selecting Top-Cut Values Consider: 1. **Statistical analysis**: Probability plots, percentiles 2. **Geological context**: Are high grades real or analytical errors? 3. **Impact on resources**: Balance between grade and tonnage 4. **Industry standards**: P95-P99 common for precious metals 5. **Domaining**: Different top-cuts for different domains **Document your decision** with geological and statistical justification! ::: ## Grade Distribution by Lithology Understanding how grades vary by rock type. ```{r} #| fig-width: 10 #| fig-height: 8 # Faceted histograms by lithology ggplot(combined_data, aes(x = au_ppm)) + geom_histogram(bins = 30, fill = "skyblue", color = "black", alpha = 0.7) + facet_wrap(~ lithology, scales = "free_y", ncol = 2) + labs( title = "Gold Grade Distribution by Lithology", x = "Au (ppm)", y = "Frequency" ) + theme_minimal() + theme(strip.text = element_text(face = "bold", size = 10)) ``` ### Statistical Comparison by Lithology ```{r} litho_stats <- combined_data %>% group_by(lithology) %>% summarise( Count = n(), Mean = mean(au_ppm, na.rm = TRUE), Median = median(au_ppm, na.rm = TRUE), `Std Dev` = sd(au_ppm, na.rm = TRUE), CV = sd(au_ppm, na.rm = TRUE) / mean(au_ppm, na.rm = TRUE), Max = max(au_ppm, na.rm = TRUE) ) %>% arrange(desc(Mean)) %>% mutate(across(where(is.numeric) & !Count, ~round(.x, 3))) datatable(litho_stats, caption = "Table 5: Grade Statistics by Lithology", options = list(pageLength = 10)) %>% formatStyle( 'Mean', background = styleColorBar(litho_stats$Mean, 'lightblue'), backgroundSize = '100% 90%', backgroundRepeat = 'no-repeat', backgroundPosition = 'center' ) ``` ::: {.callout-note} ## Geological Insights This analysis reveals: - **Host rocks**: Which lithologies contain mineralization? - **Barren zones**: Low-grade lithologies to exclude? - **Grade differences**: Justification for separate domains? - **Continuity**: Are grades consistent within lithology? ::: ## Summary: Pillars 2 & 3 ### Key Achievements **Spatial Analysis (Pillar 2):** ✓ Visualized drillhole distribution in 2D and 3D ✓ Identified sampling density patterns ✓ Analyzed composite length consistency ✓ Recognized spatial trends in mineralization **Statistical Analysis (Pillar 3):** ✓ Characterized grade distributions ✓ Identified outliers and proposed top-cuts ✓ Compared statistics by lithology ✓ Assessed data populations ### Checklist: Before Moving to Pillar 4 - [ ] Spatial patterns understood and documented - [ ] Sampling density adequate for classification - [ ] Grade populations identified - [ ] Outliers investigated and treatment decided - [ ] Top-cut values proposed with justification - [ ] Lithology-grade relationships documented ::: {.callout-tip} ## Ready for Geological Controls? With spatial and statistical understanding complete, you're ready to connect these patterns to geology in [Part 3: Geological Controls & Domain Definition](/posts/Geological Domain/). ::: ## Best Practices Recap ### Common Pitfalls to Avoid 1. **Ignoring spatial patterns** - Statistics alone aren't enough 2. **Over-relying on automation** - Always apply geological thinking 3. **Arbitrary top-cutting** - Document geological justification 4. **Skipping lithology analysis** - Rock types control grades ### Documentation Requirements For JORC compliance, document: - Spatial distribution maps and density analysis - Statistical summary by domain/lithology - Outlier treatment decisions with justification - Top-cut analysis and selected values - Population identification methodology --- *Previous: [← Part 1 - Data Validation](/posts/Data Validation/)* *Next: [Part 3 - Geological Controls & Domain Definition →](/posts/Geological Domain/)*